Entry Name:  KBSI-BENJAMIN-MC1

VAST Challenge 2015
Mini-Challenge 1

 

 

Team Members:

Perakath Benjamin, Knowledge Based Systems Inc., pbenjamin@kbsi.com (PRIMARY)

Karthic Madanagopal, Knowledge Based Systems Inc., kmadanagopal@kbsi.com

Kumar Akella, Knowledge Based Systems Inc., kakella@kbsi.com

Kalyan Vadakkeveedu, Knowledge Based Systems Inc., kvadakkeveedu@kbsi.com

 

Student Team:  NO

 

Did you use data from both mini-challenges?  NO

 

Analytic Tools Used:

-        Intelligence Products Mosaic: This semantic framework supports information discovery, sense making, and presentation in a dynamic, collaborative environment. This technology reduces ‘data-to-decision’ time through the use of semantic and collaborative visual analytics techniques.

-        D3: We built our custom data explorer using D3.js visualization library. It helps to dynamically generate SVGs from data.

-        Microsoft SQL Server 2008: In order to make the exploration scalable, we loaded the MC1 movement data into Microsoft SQL server and wrote SQL queries to extract useful statistics.

-        Microsoft Office Excel®: The statistics extracted using SQL queries were loaded into Excel and analyzed in detail by creating various plots to validate our hypothesis.

-        MATLAB®: We performed data exploration, visualization and statistical analysis on the MC1 data using algorithms written in MATLAB.

-        NodeXL:  The plotting capabilities of NodeXL plugin for Microsoft Excel were utilized to visualize the data and the results of our analysis algorithms.

Approximately how many hours were spent working on this submission in total?

120 hours

 

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2015 is complete? YES

 

Video Download

Video:

http://youtu.be/OWw8QWQH8yY

 

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Questions

MC1.1Characterize the attendance at DinoFun World on this weekend. Describe up to twelve different types of groups at the park on this weekend. 

a.      How big is this type of group?

b.      Where does this type of group like to go in the park?

c.      How common is this type of group?

d.      What are your other observations about this type of group?

e.      What can you infer about this type of group?

f.       If you were to make one improvement to the park to better meet this group’s needs, what would it be?

Limit your response to no more than 12 images and 1000 words.

1a.

We built a visual analytic interface to explore the movement data for all three days and identified the various group types. Figure 1 shows how groups (individuals traversing together in the park) of different sizes stack up relative to each other for Saturday.

 

Figure 1: Saturday Visitors by Group Size

Here there are 20 different group types ranging from sizes 3 to 43.  For each group size, the number of instances (how many distinct occurrences) are computed.  For e.g., Group Size 4 has 314 instances that account to 14.95% of total number of group instances visiting the park on Saturday.  So, each segmented area gives an indication of how big the group type is.  It happens that Group Sizes 30 to 43 have instance percentage of 0.05%.  Figure 2 and Figure 3 show the group types identified for Friday and Sunday.

Figure 2:  Friday Visitors by Group Size

 

Figure 3:  Sunday Visitors by Group Size

 

1b.

Continuing with the Saturday’s data, we investigated what attraction category Group Size 4 is most and least likely to visit.  Figure 4 indicates the most popular attraction category is Thrill Rides with 1200 visits, followed by Shows and Entertainment with 1000 visits, and Kiddie Rides with 200 visits.

Figure 4:  Group Size 4 Preference Data for Saturday

1c.

In the context of Saturday’s data, the number of instances for Group Size 3 and Group Size 4 are comparable with 14.95% and 18.05%, respectively as shown in Figure 5.  Likewise, Group Sizes 30 to 43 are comparable with 0.05%.  The larger the group size, the least likely is their occurrence in dataset.  So, Group Sizes 3 and 4 combined have a likelihood of 1/3 occurrence.

Figure 5:  Saturday Visitors Grouped by Size showing Relative Frequency

1d.

The box plot in Figure 6 is for Saturday’s data shows comparison of Group Size 4 with Group Sizes 2, 3, and 5.  Any instances of these group sizes are likely to visit 16 rides.  Likewise, any instances of Group Sizes 6 and 7 are likely to visit 23 rides.  Other such observations are also evident with other Group Sizes. 

All individual members of different group types move together and visit the same sequence of attractions except for a few anomalous instances as identified in Figure 10.

Figure 6:  Comparison of Group Size 4 with Groups of Different Sizes

1e.

Further analyzing Saturday’s data for Group Size 4, we discovered as shown in Figure 7 that the most preferred attraction type for individuals is Thrill Rides and the least favorable attraction type is either Kiddie Rides or Rides for Everyone.  Out of a total of 314 instances, plots are presented for four instances.  This trend was evident for vast number of instances.

 

Figure 7:  Preferred Attraction Category for Group Size 4

 

 

1f.

The movement of an instance of Group Size 4 (Instance #4) from Saturday’s data is plotted in Figure 8.  The traverse diagram on the left, shows this group instance travelled together throughout their movement in the park, as indicated by pink colored box at each attraction ID.  They started at entry point (0) and moved to attraction 5 (thrill ride), continued to attraction 8 (thrill ride), followed to attraction 13 (kiddie ride), journeyed again to attraction 5 (thrill ride), then to attraction 7 (thrill ride), and ended at attraction 1 (thrill ride).  In addition to the attraction sequence, the figure also shows traversing distance between attraction by means of link thickness, i.e., greater the thickness longer the distance.  The location map on the right shows respective attraction sites w.r.t. park guide.  The suggestion for improving customer experience is to co-locate all Thrill Rides rather than geographically disperse them, for instance, as seen in location map, distance from 5 to 8 is relatively large compared to other pairs of attractions.

Figure 8:  Group Movement Data for a Group Instance

 

 

MC1.2 – Are there notable differences in the patterns of activity on in the park across the three days?  Please describe the notable difference you see.

 

Limit your response to no more than 3 images and 300 words.

 

2.

The visitor patterns at locations 32, 63, and 64 are in contrast across three days as shown in Figure 9.  For instance, on Sunday, all visitor recordings at location 32 dropped to zero after 11:59 am and similar behavior is noticed at location 63 after 10:59 am.  Also, visitor recordings at location 64 is slightest higher (for most part) on Sunday compared to Saturday.  Saturday seems to exhibit higher visitor attendance at locations 32 and 63 compared to Friday and Sunday’s data.

 

Figure 9:  Visitor Patterns for Locations 32, 63 and 64

 

MC1.3What anomalies or unusual patterns do you see? Describe no more than 10 anomalies, and prioritize those unusual patterns that you think are most likely to be relevant to the crime.

 

Limit your response to no more than 10 images and 500 words.

 

3.

Analyzing Friday’s data for Group Size 10 which has 11 instances, it is observed that members in all Group Instances (except instance #6) move through the park in sync (visit attractions in sequence) for the whole day, as shown in Figure 10.  It is noticed in Group Instance 6 that member 410025 has missed one attraction #27 at time 15:04, among the collection of 23 attractions, while his peers have visited that attraction.  Similar trends were observed for Saturday (Group Size 10, Group Instance 11) for member 565489 who missed attraction 5 at 11:06 am and Saturday (Group Size 10, Group Instance 9) for member 810466 who missed attraction 81 at 18:27.

 

Figure 10:  Anomalies for Group Size 10